RNA-Seq Data Analysis ◾ 205
#Removing rows without Entrez ids
i <- is.na(y$genes$ENTREZID)
y <- y[!i, ]
#Creating the design matrix
condition <- factor(sampleinfo$condition)
design <- model.matrix(~ 0 + condition)
#Filtering genes with low abundance
keep <- filterByExpr(y, design)
y <- y[keep, , keep.lib.sizes=FALSE]
# Normalizing count data
yNorm <- calcNormFactors(y)
#Estimating dispersions:
yNorm <- estimateDisp(yNorm, design)
Once you have run the above script successfully without any error, then you can use the
“vidger” functions to create plots as follows.
FIGURE 5.30 Box plots showing the distribution of normal and tumor counts in CPM.